-
Notifications
You must be signed in to change notification settings - Fork 32
Replace GraphMap’s map with a moka::sync::Cache #1224
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
We should add some tests around the cache eviction edge cases:
|
modules/analysis/src/model.rs
Outdated
| pub(crate) fn set_approximate_memory_size(&self) -> PackageNode { | ||
| // Is there a better way to do this? | ||
| let size = size_of::<PackageNode>() | ||
| + self.sbom_id.len() | ||
| + self.node_id.len() | ||
| + self.purl.iter().fold(0, |acc, purl| | ||
| // use the json string length as an approximation of the memory size | ||
| acc + serde_json::to_string(purl).unwrap_or_else(|_| "".to_string()).len()) | ||
| + self.cpe.iter().fold(0, |acc, cpe| | ||
| // use the json string length as an approximation of the memory size | ||
| acc + serde_json::to_string(cpe).unwrap_or_else(|_| "".to_string()).len()) | ||
| + self.name.len() | ||
| + self.version.len() | ||
| + self.published.len() | ||
| + self.document_id.len() | ||
| + self.product_name.len() | ||
| + self.product_version.len(); | ||
|
|
||
| PackageNode { | ||
| approximate_memory_size: size.try_into().unwrap_or(u32::MAX), | ||
| ..self.clone() | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a better way to estimate how much memory this thing uses up?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Kinda assuming that saving the approximate_memory_size will give us savings later, it might not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I switched to using the DeepSizeOf trait which has a derive macro and it seems to give much better results.
|
@JimFuller-RedHat I think I'm happy with this now. I'm going to keep it draft until #1217 goes in so you guys don't suffer the conflicts. But if you have idle time maybe worth a gander in case you think this went in the wrong direction. |
|
@chirino this looks really great (you are amazing!) ... and yes its likely to change a little bit after our PR (will review in full once we get that one into main) ... one thought was do we need to expose an env var to control params for easy configuration ? |
|
added cli option to allow for configuration, squashed and rebased off main. |
| #[arg( | ||
| id = "max-cache-size", | ||
| long, | ||
| env = "TRUSTD_MAX_CACHE_SIZE", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
super!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit mind to add a line on https://github.com/trustification/trustify/blob/main/docs/env-vars.md ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will do
| bytesize = "1.3" | ||
| chrono = { version = "0.4.35", default-features = false } | ||
| clap = "4" | ||
| moka = "0.12.10" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
have you consider also exposing moka expiration policies TTL and TTI as env vars ? I was hoping that we can control both time and mem dimensions ... Moka looks great!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah was considering it. But at least as this PR is, we should be able to avoid blowing up the memory usage which was the main goal of this PR.
|
|
||
| self.graph.write().insert(distinct_sbom_id.to_string(), g); | ||
| let g = Arc::new(g); | ||
| self.graph_cache |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it occurred to me there might be times when we want to suspend caching (might be useful for debugging for example)... not sure if this is important or not to consider for right now
| #[derive(Clone)] | ||
| pub struct AnalysisService { | ||
| graph: Arc<RwLock<GraphMap>>, | ||
| graph_cache: Arc<GraphMap>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would wait till @ctron reviews ... previously he reworked things in this area and it was a bit tricky.
| let all_graphs = service.load_all_graphs(&ctx.db).await?; | ||
| assert_eq!(all_graphs.len(), 2); | ||
|
|
||
| let big_sbom_size = service.cache_size_used() - small_sbom_size; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
very nice !
JimFuller-RedHat
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
apart from a few minor observations - looks great to me - as @ctron did the final fixes on the service invoke stuff good to get his eyeballs on it as well (as it was more then a little fiddly).
ctron
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. A few things to iron out.
And I really would to have more nested use sections with every PR rather than less 😉
modules/analysis/src/config.rs
Outdated
| env = "TRUSTD_MAX_CACHE_SIZE", | ||
| default_value = "200 MB", | ||
| help = "Maximum size of the graph cache. You can use binary units like 'kb', 'mb' or 'gb'.", | ||
| value_parser = u64_from_size |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to keep this aligned with the existing way to parse byte sizes: https://github.com/trustification/trustify/blob/a7e2855368841cd6e667b35aab5608d411330749/common/infrastructure/src/app/http.rs#L134-L141
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will do
| fmt, | ||
| ops::{Deref, DerefMut}, | ||
| }; | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please merge and nest this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok. still wish tooling automated this.
modules/analysis/src/service/mod.rs
Outdated
| &self, | ||
| query: impl Into<GraphQuery<'a>> + Debug, | ||
| distinct_sbom_ids: Vec<String>, | ||
| graphs: Vec<(String, Arc<PackageGraph>)>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like it's not necessary to pass a Vec, so maybe use &[Arc<PackageGraph>] instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You right, will fix
|
|
||
| impl AnalysisService { | ||
| pub fn render_dot(&self, sbom: &str) -> Option<String> { | ||
| pub fn render_dot(&self, graph: Arc<PackageGraph>) -> Option<String> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the Arc really required? Or is a &PackageGraph good enough?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ref is good enough.
server/src/profile/api.rs
Outdated
| }; | ||
|
|
||
| Ok(InitData { | ||
| analysis: AnalysisService::new_sized(run.analysis.max_cache_size), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be great it one could construct an AnalysisService from an AnalysisConfig directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will change that around. And give AnalysisConfig a default.
modules/analysis/src/service/mod.rs
Outdated
| /// of having its own cache. So creating a new instance should be a deliberate choice. | ||
| #[allow(clippy::new_without_default)] | ||
| pub fn new() -> Self { | ||
| Self::new_sized(1024 * 1024 * 100) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am worried this gets overlooked. It looks like we use this for tests only. One way could be to add #[cfg(test]. Or drop it an have a const default value for tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll just drop it and fix it in the tests.
modules/analysis/src/service/mod.rs
Outdated
| { | ||
| let query = query.into(); | ||
|
|
||
| // RwLock for reading hashmap<graph> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's not rwlock anymore, is there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will drop
ctron
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd still prefer the use statements nested 😉
Fixes issue guacsec#1218 Signed-off-by: Hiram Chirino <[email protected]>
Fixes issue #1218